NASA Contractor Report 181930 
ICASE Report No. 89-73 


ICASE 


G 

< 



> 

z 

•H 

72 

> 


— i 



> 

> 

J—l 

tj? 

i 

P'S 

r 

o 

> 

m 

7) 

— 1 


i 


x 

H 1 

c 

m 

CL 

z 

H 



33 


71 




o 

o 

D 


w 

to 

> 



n 



<71 

“0 

AJ 

a 

> 

(0 


73 

X3 

t— < 

> 

O 

H 

n 


X 

n 

rr 


m 


4/> 

r- 


nri 


n-i 

c 

30 

o 

7? 

a 

> 


e. 

in c 

m 

m Z 

o 


n 

H 


G 

m 


Z 






H 



73 


a 

> 






z 



IT: 



a 



Cl 


O C 2 

N *3 so 

sx n c 

H* — * | 

£ \r* 

O ul 4> 

-si CO 

• 


rv 


PARALLEL PROJECTED VARIABLE METRIC 
ALGORITHMS FOR UNCONSTRAINED OPTIMIZATION 


T. L. Freeman 


Contract No. NAS 1-1 8605 
October 1989 

Institute for Computer Applications in Science and Engineering 
NASA Langley Research Center 
Hampton, Virginia 23665-5225 

Operated by the Universities Space Research Association 


NASA 

National Aeronautics and 
Space Administration 

Langley Research Center 

Hampton, Virginia 23665-5225 




PARALLEL PROJECTED VARIABLE METRIC ALGORITHMS 
FOR UNCONSTRAINED OPTIMIZATION 


T. L. Freeman* 


Center for Mathematical Software Resaerch 
University of Liverpool 
Liverpool, L69 3BX. 

and 

Department of Mathematics 
University of Manchester 
Manchester, M13 9PL 



ABSTRACT 


We review the parallel variable metric optimization algorithms of Straeter (1973) and van 
Laarhoven (1985) and point out the possible drawbacks of these algorithms. By including 
Davidon (1975) projections in the variable metric updating we can generalize Straeter’s 
algorithm to a family of parallel projected variable metric algorithms which do not suffer the 
above drawbacks and which retain quadratic termination. Finally we consider the numerical 
performance of one member of the family on several standard example problems and illustrate 
how the choice of the displacement vectors affects the performance of the algorithm. 
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1 Introduction 


In this paper we consider the problem of finding the unconstrained minimum of the nonlinear 
function of n variables /(*), where /(as) is twice continuously differentiable and * € RAn. 
In particular we consider the development of algorithms for the solution of the problem 
on a parallel computer. Function and gradient evaluations are usually considered to be 
the most computationally expensive part of an optimization algorithm and most parallel 
optimization algorithms include the simultaneous evaluation of the function, or the gradient 
vector, at a number of different points. The expectation is that this extra function or 
gradient information will result in an algorithm which converges more rapidly. For a survey 
of parallel optimization algorithms see Lootsma and Ragsdell (1988), Lootsma (1989) or 
Schnabel (1988). 

On a sequential computer, when the function /(as) and the gradient vector g(as) are 
available, but the Hessian matrix G(x) is not available, one of the most popular methods 
for finding an unconstrained local minimum is a variable metric (quasi-Newton) method (see 
Fletcher (1987), Gill, Murray and Wright (1981), Dennis and Schnabel (1983)). One way of 
adapting such an algorithm to a parallel computer is that considered by Straeter (1973) and 
van Laarhoven (1985). It is the further development of these ideas which is the subject of 
this paper. Alternative approaches to parallel variable metric methods have been suggested 
by Schnabel (1987) and Byrd, Schnabel and Shultz (1988a,b). 

In Section 2 we review the parallel variable metric method of Straeter (1973) and van 
Laarhoven (1985) and point out the possible drawbacks of the method. One way of avoiding 
these difficulties is to use suitably projected vectors in the variable metric updating, as 
suggested by Davidon (1975) in the context of a sequential variable metric algorithm. This 
leads to the family of parallel variable metric methods described in Section 3. In Section 4 we 
consider the numerical performance of these new algorithms on a collection of test examples. 

2 A parallel symmetric rank one algorithm 

The first attempt to develop a parallel variable metric algorithm is due to Straeter (1973). 
He proposed a parallel generalisation of the well-known symmetric rank-one (SRI) algorithm 
(see Fletcher (1987), p.51). On each iteration the algorithm evaluates the gradient vector 
at different displaced points, in parallel, and incorporates the gradient information thus 
obtained into the approximate inverse Hessian matrix H by a sequence of SRI updates. 
The expectation, which is justified by limited numerical experiments, is that this extra 
gradient information will result in an improved Hessian approximation which in turn will 
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result in an algorithm which requires fewer iterations for convergence. 

If we let H denote the approximate inverse Hessian matrix, then Straeter’s algorithm is 
given by the following steps. 

x. Select as Ao G RAn, and n linearly independent directions 6Xi,6Xa, . . . , 6Xn , and set 
H 0 = I and k = 0. 

a. Calculate, in parallel, V f(x Afc), V /(xAfc, x), V /(xAfc, a), . . . , V f(x\k, n ), where 
x\k, j = x\k + 6 Xj . 

With Vfc.o = H k, for j = 1,2, ... ,n, calculate 


V/(xAfc,i) — V/(xAfc), 

(2.1) 

= Xk, j 6 Xj, 

(2.2) 

r\k*jr\k.j\T 

J 7 A k , j A T r A k , j 

( 2 . 3 ) 


Set H = V* in and calculate the search direction 

s\k = -H k+x V f(xXk). 


( 2 . 4 ) 


3. Perform an approximate line search along a Afc to determine the steplength a\k. Set 

xAfc + x = as Afc + aXksXk, (2-5) 

k = k + 1 (2.6) 

and return to a. 

It can be shown that, for the positive definite quadratic function ?(*) = ^asATAx + 
bXTx + c and for an arbitrary initial point xAo, H x = AX — x and the first iteration will 
locate the minimum of q(x ) provided that a steplength of 1 is chosen. This result is to be 
expected since the SRI algorithm is known to generate the true (inverse) Hessian matrix 
after inexact line searches along n linearly independent directions (see Brodlie (1977)). 

The SRI updates (step a) of Straeter’s algorithm are performed sequentially, with V k j 
dependent on SXj,jXk,j and V k ,j- x . It is not possible to incorporate these updates simul- 
taneously since it can be shown that multiple secant updates are, in general, inconsistent 
with preserving symmetry of the approximate matrices (see Schnabel (1983)). 

van Laarhoven (1985) attempts to generalize Straeter’s ideas to the Huang family of 
updating formulae (Huang (1970)), but finds that, in general, the only symmetric formula 
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which has quadratic termination is Straeter’s parallel SRI formula. This is inevitable since 
the SRI formula is the only member of the Broyden family of updating formaulae which has 
quadratic termination without exact line searches (see Brodlie (1977)). 

van Laarhoven (1985) derives a parallel generalisation of Broyden’s rank one formula 
(Broyden (1965)). This results in an algorithm which has quadratic termination, but the 
approximate matrices are in general unsymmetric. 

One major drawback of the parallel algorithms of this section is that the approximate 
matrices H k are not guaranteed to be positive definite and may fail to exist if the denominator 
of the rank one correction is zero. In the next section we describe a parallel generalisation 
of Davidon’s projected updating formula (Davidon (1975)) which has quadratic termination 
and which guarantees the existence and positive definiteness of the approximate matrices. 

3 A family of parallel symmetric rank two algorithms 

In this section we obtain a family of parallel symmetric rank two algorithms, which have 
quadratic termination, by generalising Straeter’s ideas to the family of projected updating 
formulae of Davidon (1975). Serial variable metric algorithms based on Davidon s projected 
updating formulae can be shown to have quadratic termination without requiring exact line 
searches on each iteration. 

The obvious extension of Straeter’s ideas would be to use a projected updating formula 
to update V kij , j = 0, 1, . . . ,n - 1, (and thus H k ) in step a of the algorithm of Section 2. 

However to calculate the required projected vectors it is necessary to also update the inverse 
matrices V k jX — i , j — 0, 1, . . . ,n. This contrasts with the serial implementation where the 
special form of the gradient differences enables the projected vectors to be calculated without 
explicit knowledge of H A — i (see Davidon (1975), Shanno and Phua (1978a) and Freeman 
and Korner (1982)). Given that both the approximate Hessian matrix and its inverse are 
required it is efficient to update the L D L XT factors of an approximation B to the Hessian 
matrix itself. 

In addition, since the information is readily available, we use the same formula to update 
the approximate Hessian matrix after each line search. The resulting algorithm is as follows. 

i. Select x\o e RAn, and n linearly independent directions £Ai,£Aa,...,£An , and set 
B 0 = /(=> L 0 — I and D 0 = i) and k = 0, and calculate V/(xAo). 

a. Calculate, in parallel, V /(* Afc, i), V /(* Afc, a), . . . , V f(x Afc, n), where * Afc , j = xXk +6Xj 
With W ki0 = L k D k L k XT = B k , for j = 1, 2, . . . ,n, calculate 

7 Afc, j = V /(* Afc, j ) — V/(x Afc), (3.1) 
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and update the LDLAT factors of W k j_ 1 ► w k j using a projected symmetric 

rank two updating formula. 

Set L k+l D k+1 L k+x AT = B k+1 = W kn and calculate the search direction aAk, where 

L k+1 ~D k+1 T k+1 ATaAk = -V/(* Afe). (3.2) 

3’ Perform an approximate line search along s\k to determine the steplength a\k. Set 


x\k + 1 = xA k + aXksXk , (3-3) 

calculate V f(x Ak + i) and 

7 Afe = Vf(xAk +i) — Vf(xAk), (3.4) 

and update the LD LXT factors of B k+1 ► B k+l using a projected symmetric rank 

two updating formula. Set 

= B jt+ij (3-5) 

k = k + 1 ( 3 . 6 ) 

and return to a . 


If we omit the superfixes and suffices and let (1 denote an updated quantity then the 
projected rank two updating formula for W is given by 


„ yyXT WzzXTW 

WAt = W +- + <f>(zXTWz)wwXT, 


yXTz zXTWz 


where 


y 


W 


w = 


(3.7) 


yXTz zXTWz 

y = B 7 > 

z = PATS, 

P =y[yATWA-iy]A-iYATWA-i, 

Y = [u, v], 

and v is given by 

v = w 6 - 7 , (3.13) 

and uA||, one of the basis vectors for the next application of the updating formula, is given 

by 

uA0 = (»A T6)u - (uA T6)v. (3.14) 


(3.8) 

(3.9) 

(3.10) 

(3.11) 

(3.12) 
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If we set u = - V f(x\k) at the start of each iteration then the algorithm can be shown 
to have quadratic termination (see Davidon (1975), Powell (1977)). Further , provided that 
yXTz > 0 and (j) is suitably bounded below, the update maintains both symmetry and 
positive definiteness of W k ,j and hence B k . 

We update the LDLXT factors of W k j by using the formulae given in Fletcher and 
Powell (1974). Their composite t-method includes monitoring of the updating to ensure 
that rounding errors do not cause the matrices W k ,j to become indefinite. 

As noted in Freeman and Korner (1982), the vectors « and v simply provide a basis for a 
space which is orthogonal to the preceding updating directions. In order to make this basis 
well-scaled we normalise the vectors u and v in the W A — i metric, so that we define the 


normalised basis vectors 


u 


u = 


(uXTW A-x«)Ai 


(3.15) 

(3.16) 


(vXTW 

and use these vectors in place of u and v in (3.12). 

When u” and tT are linearly dependent in the W X — i metric the projected symmetric 
rank two update reduces to the symmetric rank one {SRI) update. Thus if 


1 - (uXTW A-iF)A 2 < e 


(3.17) 


where e is the machine precision, then a symmetric rank one update, based on 7 and 6 , is 
attempted. Similarly, if 

yXTz < e, (3.18) 

the projected symmetric rank two update is not guaranteed to maintain positive definiteness 
and a symmetric rank one update, based on 7 and 6 , is again attempted. In both cases 
this SRI update is abandoned if it would result in an indefinite Hessian matrix, a condition 
which can be recognised by the composite t-method which is used to perform the updating. 


4 Numerical Results 

In this section we illustrate the numerical performance of the algorithm of Section 3 by 
applying it to the set of test problems considered by Shanno and Phua (1978a). The results 
were obtained using the Amdahl 5890-300 at the Manchester Computing Centre using about 
14 decimal digit accuracy. Before considering the numerical results some of the details of 
the algorithm need to be clarified. 
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The line search of step 3. uses bracketing and interpolation as described in Section 2.6 
of Fletcher (1987) with the parameters a. = 0.99 and p = 0.01. The convergence condition 
of the overall algorithm is 

||«A*||a<e, (4-1) 

where in the examples of this section we take e = 10A— 7, or approximately the square root 
of the machine precision. The parameter (j> of (3.7) is taken to be 0 corresponding to a 
projected BFGS updating formula. 

We consider two alternative choices for the n linearly independent directions 6 Xj , j = 
1, 2, . . . ,n, of step a. of the algorithm. 

The algorithm, PARALLEL I, defines SXj as 

6Xj = reXj, (4.2) 


where eXj is the jXth column of the n x n identity matrix, and 


T = 


p, k = 0, 

p\\x Xk — xXk ~ 1 ||a, k > 1 


(4.3) 


Thus on the kXth iteration the magnitudes of the displacements on which the parallel updat- 
ing is based depend on the magnitude of the step taken by the algorithm on the (A; — 1)A th 
iteration. In the examples of Table 1 we take fi = 10 A— 2, except for the extended Rosenbrock 
function with n = 20, in which case we take fi = 10A — 4. 

The alternative algorithm, PARALLEL //, defines 


6Xj =TlXj/\\lXj\\ 2 , 


(4.4) 


where, on the k\th iteration, IX j is the j\th column of the n x n matrix L^X—T and r 
is given by (4.3). The justification for this choice is that the vectors IX j are mutually B * 
conjugate, since 

L kX i B ifL ifX T — D icy 

and D fc is diagonal. This choice of SXj is slightly more expensive since it involves the 
solution of n triangular systems of equations on each iteration. Note that, as in PARALLEL 
I, on the kXth iteration, k > 1, 


||tfAj|| 2 = ^t|| se A fc — xXk — i|| 2 ,j = 1,2,. ..,n. 


The results of Table 1 are obtained using the value p = 10A— 2 in (4.3). 

Table 1 includes the number of function evaluations and the number of iterations required 
to satisfy the convergence condition. *** indicates that the algorithm fails to satisfy the 
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convergence condition after 400 function evaluations. The starting values (initial point) are 
given in Table 1, except for the Mancino function for which the starting values are given in 
Shanno and Phua (1978b). 

These numerical results show that the parallel projected variable metric algorithms, PAR- 
ALLEL I and PARALLEL II, converge in less iterations (for some problems considerably 
less iterations) than the corresponding projected quasi-Newton algorithm. Of course each 
iteration of the parallel algorithms requires the evaluation of the gradient vector at the n 
displaced points ; we are assuming that a parallel MIMD computer will allow these gradient 
vectors to be evaluated simultaneously (indeed the gradient vectors could be evaluated con- 
currently with the line search of the previous iteration using the speculative evaluation ideas 
of Schnabel (1987). 

Choosing the displacement directions as the normalised columns of LX—T, PARAL- 
LEL II, results in an algorithm which is somewhat more efficient (in terms of iterations) 
than PARALLEL I, which chooses the displacement directions as the co-ordinate directions. 
PARALLEL II requires the solution of n triangular systems of linear equations on each it- 
eration; again a parallel MIMD computer will allow each of these triangular systems to be 
solved concurrently on the separate processors. 

5 Conclusions 

One of the major reservations about the parallel variable metric algorithm of Straeter (1965) 
is its use of the symmetric rank one (SRI) updating formula, which allows quadratic ter- 
mination of the algorithm to be established. In this paper we have generalized Straeter’s 
algorithm to use a projected symmetric rank two updating formula (Davidon (1975)) and 
have thus developed a family of parallel projected variable metric algorithms. These al- 
gorithms avoid the use of the SRI updating formula, yet retain the quadratic termination 
property of Straeter’s algorithm. 

Initial numerical testing, on a serial computer, indicates that these new parallel algo- 
rithms are more efficient (in terms of number of iterations required) than existing serial 
variable metric algorithms. For some problems, such as the extended Rosenbrock function 
of dimension 20, the parallel algorithms are considerably more efficient. For this reduced 
number of iterations to result in an algorithm which is more efficient (in terms of execu- 
tion time) on a parallel computer depends on the assumption that the cost of function and 
gradient vector evaluations dominate all other costs of the algorithm. 

The new algorithm can exploit parallel computing capabilities to evaluate the displaced 
gradient vectors; however it is unclear that the sequence of projected rank two updates could 
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exploit parallelism. The implementation and performance of the new parallel algorithms on 
a local memory MIMD computer will be reported in a separate paper. 
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FUNCTION 

Initial 

point 

PARALLEL I 1 

PARALLEL II 

PROJECTED 

Iterations 

Function 

evaluations 

Iterations 

Function 

evaluations 

Iterations 

Function 

evaluations 

Rosenbrock 2 







(-1.2,1) 

21 

22 

18 

21 

35 

50 

(2,-2) 

16 

17 

15 

18 

44 

62 

(-3.635,5.621) 

32 

34 

31 

42 

27 

41 

(6.39,-0.221) 

35 

40 

36 

48 

56 

81 

(1.489,-2.547) 

13 

14 

12 

14 

34 

52 

Rosenbrock 20 







(-1.2,1,...) 

23 

29 

27 

37 

150 

233 

(2,-2,...) 

37 

126 

31 

50 

124 

205 

(-3.635,5.621,...) 

64 

170 

54 

164 

99 

166 

(6.39,-0.221,...) 

118 

313 

64 

170 

*** 

*** 

(1.489,-2.547,...) 

22 

43 

18 

41 

138 

227 

Wood 4 







(-3,-1, -3,-1) 

37 

42 

35 

48 

33 

59 

(-3,1, -3,1) 

38 

50 

32 

41 

34 

61 

(-1.2,1, -1.2,1) 

33 

41 

32 

40 

70 

106 

(-1.2,1, 1.2,1) 

21 

26 

19 

22 

46 

69 

Oren 20 







(i.i i) 

43 

43 

43 

43 

96 

201 

Powell 4 







(-3, -1,0,1) 

47 

48 

42 

42 

72 

87 

Mancino 10 

4 

4 

4 

4 

13 

62 

Mancino 20 

5 

5 

4 

4 

29 

127 

Mancino 30 

5 

5 

5 

5 

33 

164 


Table 1: Numerical Performance of Parallel variable metric algorithms 
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